NSF PAR Search | NSF Public Access Repository

Vision-based perception systems are crucial for profitable autonomous-driving vehicle products. High accuracy in such perception systems is being enabled by rapidly evolving convolution neural networks (CNNs). To achieve a better understanding of its surrounding environment, a vehicle must be provided with full coverage via multiple cameras. However, when processing multiple video streams, existing CNN frameworks often fail to provide enough inference performance, particularly on embedded hardware constrained by size, weight, and power limits. This paper presents the results of an industrial case study that was conducted to re-think the design of CNN software to better utilize available hardware resources. In this study, techniques such as parallelism, pipelining, and the merging of per-camera images into a single composite image were considered in the context of a Drive PX2 embedded hardware platform. The study identifies a combination of techniques that can be applied to increase throughput (number of simultaneous camera streams) without significantly increasing per-frame latency (camera to CNN output) or reducing per-stream accuracy.

Making OpenVX Really "Real Time"

https://doi.org/10.1109/RTSS.2018.00018

Yang, Ming; Amert, Tanya; Yang, Kecheng; Otterness, Nathan; Anderson, James H.; Smith, F. Donelson; Wang, Shige (December 2018, 2018 IEEE Real-Time Systems Symposium (RTSS))

OpenVX is a recently ratified standard that was expressly proposed to facilitate the design of computer-vision (CV) applications used in real-time embedded systems. Despite its real-time focus, OpenVX presents several challenges when validating real-time constraints. Many of these challenges are rooted in the fact that OpenVX only implicitly defines any notion of a schedulable entity. Under OpenVX, CV applications are specified in the form of processing graphs that are inherently considered to execute monolithically end-to-end. This monolithic execution hinders parallelism and can lead to significant processing-capacity loss. Prior work partially addressed this problem by treating graph nodes as schedulable entities, but under OpenVX, these nodes represent rather coarse-grained CV functions, so the available parallelism that can be obtained in this way is quite limited. In this paper, a much more fine-grained approach for scheduling OpenVX graphs is proposed. This approach was designed to enable additional parallelism and to eliminate schedulability-related processing-capacity loss that arises when programs execute on both CPUs and graphics processing units (GPUs). Response-time analysis for this new approach is presented and its efficacy is evaluated via a case study involving an actual CV application.

Full Text Available

Search for: All records